Streaming in Data Engineering. Streaming data pipelines and real-timeâ¦ | by ð¡Mike Shakhomirov

Streaming in Data Engineering. Streaming data pipelines and real-timeâ¦ | by ð¡Mike Shakhomirov | Dec, 2023

Streaming data pipelines and real-time analytics

Streaming is one of the most popular data pipeline design patterns. Using an event as a single data point creates a constant flow of data from one point to another enabling an opportunity for real-time data ingestion and analytics. If you want to familiarise yourself with data streaming and learn how to build real-time data pipelines this story is for you. Learn how to test the solution, and mock test data to simulate event streams. This article is a great opportunity to acquire some sought-after data engineering skills working with popular streaming tools and frameworks, i.e. Kinesis, Kafka and Spark. I would like to speak about the benefits, examples, and use cases of Data Streaming.

What exactly is data streaming?

Streaming data, also known as event stream processing, is a data pipeline design pattern when data points flow constantly from the source to the destination. It can be processed in real-time, enabling real-time analytics capabilities to act on data streams and analytics events super fast. Applications can trigger immediate responses to new data events thanks to stream processing and typically it would be one of the most popular solutions to process the data on an enterprise level.

There is a data pipeline whenever there is data processing between points A and B [1].

Streaming data pipeline example. Image by author

In this example, we can create an ELT streaming data pipeline to AWS Redshift. AWS Firehose delivery stream can offer this type of seamless integration when it creates a data feed directly into the data warehouse table. Then data will be transformed to create reports with AWS Quicksight as a BI tool.

Letâs imagine we need to create a reporting dashboard to display revenue streams in our company. In many scenarios, a business requirement is to generate insights in real-time. This is exactly the case when we would want to use streaming.

Data streams can be generated by various data sources, i.e. IoT, server data streams, marketing in-app events, user activity, payment transactionsâ¦